Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add Retry Guide #10598

Merged
merged 20 commits into from
Apr 24, 2024
Merged

docs: Add Retry Guide #10598

merged 20 commits into from
Apr 24, 2024

Conversation

lqiu96
Copy link
Contributor

@lqiu96 lqiu96 commented Mar 21, 2024

Fixes: #10586

Add a section to the README.md to include basic information about how Google Cloud Java Client Libraries deal with retries.

README.md Outdated
Client Libraries use retries to handle unexpected, transient failures (i.e. Server temporarily unavailable).
Multiple attempts, hopefully, will result in a successful response from the server.

By default, retries are configured by the Cloud Service. The configured retry parameters are defined per RPC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add an example (the audience is client library users)?

Copy link
Member

@suztomo suztomo Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the audience's perspective, it's just "retries are controlled by the responses of the service." isn't it?

The configured retry parameters are defined per RPC.

If the audience doesn't have a way to read the definition, let's remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add an example (the audience is client library users)?

I'll move the Java-Asset example up to show the one service's default retry param values

From the audience's perspective, it's just "retries are controlled by the responses of the service." isn't it?

Controlled by response of the service and the time/ attempt bounds.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated
- RPC Timeout Multiplier: The change in RPC timeout. This value is multiplied by the previous call’s timeout to calculate the timeout for the next call.
- Max RPC Timeout: The limit of the RPC timeout, so that the RpcTimeoutMultiplier can't increase the RPC timeout higher than this amount

Note: It is recommended to only set _either_ the Max Attempt or the Total Timeout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of this recommendation, what's the reason behind it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection was that there was an internal doc that recommended setting only either param, but I can't find it (and perhaps my memory is wrong). Looking at old tickets, it seems that maxAttempts was never implemented for any language (Java seems to be the exception).

I see this ticket I created a while back regarding maxAttempts: googleapis/sdk-platform-java#2306, so perhaps this recommendation shouldn't exist.

README.md Outdated

See the official Google Cloud Docs: https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings

#### Jitter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should mention Jitter here because it is already deprecated and we should always set it to true.

Copy link
Contributor Author

@lqiu96 lqiu96 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think disabling the jitter flag is used quite a bit internally for tests to manage flakiness. I don't know how easily it would be to fully remove it in the future (perhaps we can just mark it with @VisibleForTests).

I mentioned this section since I think it has a bit of behavior impact on the number of retries. If a customer is counting (or using otel), they may be expecting a certain number of retry attempts. I left out the setJittered(false) configuration.

README.md Outdated
#### Retry
##### Example 1
```java
RetrySettings.newBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same RetrySettings for all examples? I think it would be easier to process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are slight differences in each RetrySetting. I can try and highlight the differences to make it clearer.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@lqiu96 lqiu96 added the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 22, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 22, 2024
@lqiu96 lqiu96 marked this pull request as ready for review March 22, 2024 17:35
@lqiu96 lqiu96 requested a review from a team as a code owner March 22, 2024 17:35
@lqiu96 lqiu96 changed the title docs: Add Retry Guide to README.md docs: Add Retry Guide Mar 22, 2024
docs/retries.md Outdated Show resolved Hide resolved
docs/retries.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@@ -0,0 +1,267 @@
# Client Side Retries
Client Libraries use retries to handle unexpected, transient failures (i.e. server temporarily unavailable).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider putting this in backticks server temporarily unavailable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I think it should be clearer now that it's not a status code or message back.

Client Libraries use retries to handle unexpected, transient failures (i.e. server temporarily unavailable).
Multiple attempts, hopefully, will result in a successful response from the server.

By default, retries are configured by the Cloud Service. The configured retry parameters are defined per RPC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider rewording Cloud Service to something like Default retry settings are configured per RPC by each backend cloud service. Probably me being paranoid, but Service is such an overloaded term, I don't want there to be confusion about a "new" proper noun

Copy link
Contributor Author

@lqiu96 lqiu96 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated. I tried rewording it to make it clearer. I didn't want to introduce the term backend, but I think it should make sense.

docs/client_retries.md Outdated Show resolved Hide resolved
docs/client_retries.md Outdated Show resolved Hide resolved
## RPC Retry Configurations
Using Java-Asset v3.41.0 as an example, the default configurations are configured in the following files:

Retry Status Codes are configured [here](https://github.com/googleapis/google-cloud-java/blob/d9da511b4b56302e509abe8b2d919a15ea7dcae7/java-asset/google-cloud-asset/src/main/java/com/google/cloud/asset/v1/stub/AssetServiceStubSettings.java#L1058-L1082)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider linking Retry Status Codes to where you define them later (line 51) - I think you can do that using anchor tags (https://gist.github.com/rachelhyman/b1f109155c9dafffe618)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let me try and move Configurable Retry Params section up.

docs/client_retries.md Outdated Show resolved Hide resolved
docs/client_retries.md Outdated Show resolved Hide resolved
docs/client_retries.md Outdated Show resolved Hide resolved
Comment on lines 45 to 48
builder
.exportAssetsSettings()
.setRetryableCodes(RETRYABLE_CODE_DEFINITIONS.get("no_retry_0_codes"))
.setRetrySettings(RETRY_PARAM_DEFINITIONS.get("no_retry_0_params"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding how this information is to be used by a developer. Is this section just FYI on how the internals are implemented, or is this information important because developers will need to modify this when X.

(The examples below don't seem to require knowledge of the RETRYABLE_CODE_DEFINITIONS and RETRY_PARAM_DEFINITIONS collections.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended this section to be to show where users can find the default retry configurations. I'll update this section to be clearer.

(The examples below don't seem to require knowledge of the RETRYABLE_CODE_DEFINITIONS and RETRY_PARAM_DEFINITIONS collections.)

Those definitions aren't important and are internal details to gax.

docs/client_retries.md Outdated Show resolved Hide resolved
docs/client_retries.md Outdated Show resolved Hide resolved
@lqiu96
Copy link
Contributor Author

lqiu96 commented Apr 23, 2024

@burkedavison @suztomo @blakeli0 @alicejli Could I get a re-review on this whenever you all get time? Plan is to add details for retries, LRO, endpoint, client initialization, etc. so that we can reference this on future customer issues.

Copy link
Contributor

@alicejli alicejli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting together! this will be very helpful to reference for customers. at some point it'd be nice to get a version of this guide onto Cloud RAD.

```
Initial Retry Delay: 100ms
Retry Delay Multiplier: 2.0
Max Retry Delay: 500ms
Copy link
Contributor

@JoeWang1127 JoeWang1127 Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dose this setting cause retry indefinite times?

Copy link
Contributor Author

@lqiu96 lqiu96 Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this current configuration implies infinitely. I'll make that clearer in the example. Thanks!


### Jitter
Jitter is added variance via randomness to spread out when the RPCs are invoked. By default, Google Cloud
Client Libraries enable jitter for retries. When jitter is enabled with exponential backoff, the client libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"When jitter is enabled" implies that it can be disabled, but setJittered is now deprecated. Can we change the wording to indicate that Jitter is always enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks!

Copy link
Contributor

@blakeli0 blakeli0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than one minor comment.

@lqiu96 lqiu96 enabled auto-merge (squash) April 24, 2024 20:28
@lqiu96 lqiu96 merged commit 69a4e1b into main Apr 24, 2024
31 checks passed
@lqiu96 lqiu96 deleted the retry-guide branch April 24, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a guide for RetrySettings
6 participants